Efficient Scheduling for Parallel Memory Hierarchies
نویسندگان
چکیده
This paper presents a scheduling algorithm for efficiently implementing nested-parallel computations on parallel memory hierarchies (trees of caches). To capture the cache cost of nested-parallel computations we introduce a parallel version of the ideal cache model. In the model algorithms can be written cache obliviously (no choices are made based on machine parameters) and analyzed using a single level of cache with parameters Z (cache size) and L (cache line size), and a parameter α specifying the algorithm’s parallelism (for input size n, n represents the number of processors that can be effectively used). For several fundamental algorithms we show that the cache cost in the parallel ideal cache model is optimal, matching the sequential bounds, with a parallelism α → 1. For example, for cache-oblivious sorting of n keys, the cache cost is Q(n;Z,L) = Θ((n/L) logZ+2 n). Our scheduler guarantees that the number of misses across all caches at each level i of the machine’s hierarchy is at most the cache cost Q(n;Zi/3, Li) as analyzed for an algorithm. Machine hierarchies are modeled as trees of caches using a symmetric variant of the parallel memory hierarchy (PMH) model. In this model, every cache at level i is of size Zi, has line size Li, transfer cost Ci (the cost of fetching a line of data from its parent cache at level i + 1), and child fanout fi. Each leaf node (level 0) is a processor, with parameters set so that its cost corresponds to the processor’s work (i.e., its instruction count). Finally, we show that if the algorithm parallelism exceeds the machine parallelism (as defined in the paper) the work is balanced including the cost of cache misses. In particular for an h-level memory hierarchy, our scheduler guarantees a total runtime of T (n) = O (∑h−1 i=0 CiQ̂α(n;Zi/3, Li) P ) on P = ∏h i=1 fi processors, where Q̂α is a “balanced” variant of Q .
منابع مشابه
Evaluation of scheduling solutions in parallel processing using DEA FDH model
This paper gives a new application of DEA to evaluate the scheduling solutions of parallel processing. It evaluates the scheduling solutions of parallel processing using the non-convex DEA model, FDH model. By introducing each solution of parallel processing scheduling as a DMU with some relevant inputs and outputs this paper shows that how the most efficient schedule(s) can be identified.
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملUniversity of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Experiments with the Fresh Breeze Tree-Based Memory Model
Recent developments have brought to the forefront some pressing and difficult problems concerning the usability of computer systems: lack of a satisfactory general purpose programming model for parallel computation; how to achieve efficient utilization of processing and memory resources; and system resilience in the presence of malicious attacks and the expectation that future hardware will be ...
متن کاملFlexible flowshop scheduling with equal number of unrelated parallel machines
This article addresses a multi-stage flowshop scheduling problem with equal number of unrelated parallel machines. The objective is to minimize the makespan for a given set of jobs in the system. This problem class is NP-hard in the strong sense, so a hybrid heuristic method for sequencing and then allocating operations of jobs to machines is developed. A number of test problems are randomly ge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010